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ABSTRACT 



Context. Asteroseismology has entered a new era with the advent of the NASA Kepler mission. Long and continuous photometric 
observations of unprecedented quality are now available which have stimulated the development of a number of suites of innovative 
analysis tools. 

Aims. The power spectra of solar-like oscillations are an inexhaustible source of information on stellar structure and evolution. Robust 
methods are hence needed in order to infer both individual oscillation mode parameters and parameters describing non-resonant 
features, thus making a seismic interpretation possible. 

Methods. We present a comprehensive guide to the implementation of a Bayesian peak-bagging tool that employs a Markov chain 
Monte Carlo (MCMC). Besides making it possible to incorporate relevant prior information through Bayes' theorem, this tool also 
allows one to obtain the marginal probability density function for each of the fitted parameters. We apply this tool to a couple 
of recent asteroseismic data sets, namely, to CoRoT observations of HD 49933 and to ground-based observations made during a 
campaign devoted to Procyon. 

Results. The developed method performs remarkably well at constraining not only in the traditional case of extracting oscillation 
frequencies, but also when pushing the limit where traditional methods have dilficulties. Moreover it provides an rigorous way of 
comparing competing models, such as the ridge identifications, against the asteroseismic data. 

Key words, methods: data analysis ~ methods: statistical - stars: late-type - stars: oscillations 



1. Introduction 



Seismology of solar-like stars is a powerful tool that can be used 
to increase our understanding of stellar structure and evolution. 
Solar-like oscillations in main-sequence stars and subgiants have 
been measured thanks to data collected fro m ground-based high- 
precision spectroscopy (for a review e.g.. Bedding & Kieldsen 
|2008.) and, more rece ntly, to ph otometric space-based missions 
such as CoRoT (e.g.. iMichel et al. 2008). Red giants also ex- 
hibit solar-Uke oscillations, although at lower frequencies, and 
hence require longer time series in order to resolve them (e.g., 
iDe Ridder et al.l2009l a nd reference s therein). The launch of the 
NASA Kepler mission (iKoch et al.l 12010,) definitely marked a 
milestone in the field of asteroseismology. Kepler will partic- 
ularly lead to a revolution in the seismology of solar-like oscilla- 
tors, since it will increase by more than two orders of magnitude 
the number of stars for which high-quality observations will be 
available, while allowing for long-term follow-ups of a selec- 
tion of those targets. The large homogeneous sample of data 
made available by Kepler opens the possibility of conducting 
a seismic survey of the solar-like part of the colour-magnitude 
diagram, which researchers in the field already started nam- 
ing as ensemble asteroseismology. As of the time of writing 
of this article, first results arising from the Kep ler asteroseis- 



mic programme had already been made available (Bedding et al 
2010a; ChaDhn et al. 2010; Gilliland et al. 2010 ; Hekker et al.. 
2010b; Stello et all lloiot fChristensen-Dalsgaard et all 1201 Ot 
Metcalfe etal.<i20l'a) . 



The rich informational content of power spectra of solar- 
like oscillations allows fundamental stellar properties (e.g. mass, 
radius, and age) to be determined, and the internal struc- 
ture to be constrained to unprecedented levels provided that 
individual oscillation mode parameters are measured (e.g., 
IChristensen-DalsgaardI [2004 ) . Furthermore, the measured stel- 
lar background signal provides us with valuable information on 
activity and convection. In the case of the highest signal-to-noise 
ratio (S IN) observations, for which it is possible to measure indi- 
vidual oscillation mode parameters, we expect asteroseismology 
to produce a major breakthrough on stellar structure and evo- 
lution, on topics as diverse as energy generation and transport, 
rotation and stellar cycles (e.g.. lKarolf et al.ll2009h . 

For the past few years significant work has been invested 
in making preparations for the mode parameter analysis of 
Kepler data. This analysis involves the estimation of individ- 
ual and average oscillation mode parameters, as well as esti- 
mation of parameters that describe non-resonant signatures of 
convection and activity. Examples include the w ork conducted 
in the framework of the AsteroFLAG consortium jChaphn et al.l 
2008^ and the work undertaken by the CoRoT Data Analysis 
Team (lAppourchaux et al]l2006l) . This consequently paved the 
way for the developm ent of suites of analysis tools for apph- 
cation to Ke pler data ("Hekkeret al. 2010a' 'Huber et al." 20091 
Karoff et al]l2010: Mathur et al...2010; .Mosser & Appourchauxl 



20091; ICampante et al.ll2010l) . 



In the present study we give continuity to this work by 
presenting a comprehensive guide to the implementation of a 
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Bayesian peak-bagging' tool that employs a MCMC. These 
techniques derive from the tools traditionally used in helioseis- 
mology and are in many ways an extension of the Maximum 
Likelihood Estimation (MLE) methods. This peak-bagging tool 
is to be applied to the power spectra of solar-like oscillators 
and used as a means to infer both individual oscillation mode 
parameters and parameters describing non-resonant features. 
Besides making it possible to incorporate relevant prior informa- 
tion through Bayes' theorem, this tool also allows one to obtain 
the marginal probability density function (PDF) for each of the 
model parameters (frequencies, mode heights, mode lifetimes, 
rotational splitting, inclination angle etc.). This is one of the 
main advantages of these MCMC techniques, as it not only per- 
forms well in low signal-to-noise conditions, but also provides 
reliable error bars on the parameters. Parameter space is sampled 
using a Metropolis-Hastings algorithm featuring a built-in sta- 
tistical control system that allows to automatically set an appro- 
priate instrumental law during the burn-in stage. Also included 
is parallel tempering, which increases the mixing properties of 
the Markov chain. 

The outline of the paper is as follows: We start in Sect. |2] 
by providing an overview of the theory behind the power spec- 
trum of solar-like oscillations, introducing the assumptions and 
the set of parameters needed to model the spectrum to the level 
of detail required by modern asteroseismic data. In Sect. [3] we 
describe the subjacent Bayesian statistical framework by high- 
lighting the topics of parameter estimation and model selection. 
Section|4]is devoted to the modus operandi of advanced Markov 
chain Monte Carlo methods and their implementation. In Sect.|5] 
we present a couple of examples where this tool has been applied 
to recent asteroseismic data sets, evidencing some of its capabil- 
ities and illustrating its functioning. A summary and discussion 
are presented in Sect. |6] 

2. The power spectrum of solar-like oscillations 

Solar-Uke oscillations or p modes (pressure playing the role 
of the restoring force) are global standing acoustic waves. 
They are characterized by being intrinsically damped while si- 
multaneously stochastically excited by near-surface convection. 
Therefore, all stars cool enough to harbor an outer convective en- 
velope - whose locus in a H-R diagram approximately extends 
from the cool edge of the Cepheid instability strip up to the red 
giant branch - may be expected to exhibit solar-like oscillations. 

Modes of oscillation are characterized by three wave num- 
bers: n, { and m. The radial order n characterizes the behaviour of 
the mode in the radial direction. The degree £ and the azimuthal 
order m determine the spherical harmonic describing the proper- 
ties of the mode as a function of colatitude and longitude. In the 
case of stellar observations, the associated whole-disk light in- 
tegration and consequent lack of spatial resolution strongly sup- 
press the signal from all but the modes of the lowest degree (with 
{<3). For a spherically symmetric star mode frequencies depend 
only on n and {. 



in turn be modelled by a mean spectrum profile, ^(vy; 0), de- 
scribed by the set of parameters which contain the desired 
physical information, multiplied by a random noise with a 
probability distribution with 2 degrees of freedom dWoodardl 
[l984t iDuvaU & HarvevlfTosl . This means that, at a fixed fre- 
quency bin j, the probability density, fiPj), that the observed 
power spectrum takes a particular value Pj, is related to the mean 
spectrum, ^(vf, 0), by: 



f(Pj) 



1 



■ exp 



.^(vj; 0) 



(1) 



'(v,;e) 

Very often when dealing with long time series, it is customary 
to divide the observational data set into several independent sub- 
sets, to compute their separate spectra and to average them. In 
doing so one aims at decreasing the variance in the power spec- 
trum. The average power spectrum will then obey a proba- 
bility distribution with 2s de grees of freedom, , s being the 
number of combined spectra (I Appourchauxl20()3aE : 



f(Pj) 



■ exp 



^(vj; 0) 



(2) 



(s-iy. ^(vj;&y 

Equati on Q also holds wh en binning the power spectrum over 
s bins (ADDOu rchauxl2004l) . 

We would now like to specify the likelihood function, i.e., 
the joint PDF for the data sample {Pj}. Assuming that the fre- 
quency bins are uncorrected, the joint PDF is simply given by 
the product of fiPf, 0) over some frequency interval of interest 
spanned by j: 

L(&)^Y]f(Pj;&). (3) 

Notice that we have written f{Pj', 0) to make the dependence on 
the parameters explicit. In spite of the fact that Eq. Q is valid 
for an uninterrupted data set, the same is not true when gaps are 
present in the time series. In that event, Stahn & Gizo 
have derived an expression for the joint PDF of solar-like oscil- 
lations i n complex Fou rier space, in agreement with the earlier 
work ofl Gabriell (II994 . The latter PDF explicitly takes into ac- 
count frequency correlations introduced by the convolution with 
the spectral window. 

The basic idea when employing a Maximum Likelihood 
Estimator (MLE) is to determi ne estimates so as to maximiz e 
the likelihood function (e.g., ^Toutain & App ourchauxl 1 1 9941) . 
Due to improved numerical stability, however, it is more con- 
venient, in practice, to work with logarithmic probabilities: 



i^(0) = lnL(0) 

= -2|ln^(y,; 



0) 



^(vj; 0) 



(4) 



One therefore ends up maximizing the logarithm of the likeli- 
hood function instead: 



= argmjx{if(0)) . 



(5) 



2.1. Statistics and iikeiihood function of ttie spectrum 

Stellar p modes can be modelled as stoch astically excited an d 
intrinsically damped harmonic oscillators (Kumar et alJll988l) . 
The frequency-power spectrum arising from such a system can 



' The term "peak-bagging" has become the customary name for the 
examination of individual oscillation peaks i n the field of asterosei s- 
mology. The origin of the name is explained in I Appourchauxl i2003bh . 



2.2. Modelling the power spectrum^ 

The power spectrum of a single mode of oscillation is dis- 
tributed around a mean profile with an exponential probability 



- To be precise, we will be modelling the power density spectrum 
and not the power spectrum. The former is independent of the window 
function and is obtained by multiplying the power spectrum by the 
effective length of the observational run, which can in turn be calculated 
as the reciprocal of the area integrated under the spectral window. 
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distribution according to Eq. ([T]l- As already mentioned, this 
mean profile contains the information on the physics of the 
mode. In the limit of taking the ensemble average of an infinite 
number of reahsation s of the power spectrum, it can be shown 
dAnderson et al.in"99d) that the limit spectrum thus obtained fol- 
lows in fact a standard Lorentzian profile near the resonance, i.e., 
for |v - Vol <Kyo. A Lorentzian profile is defined as: 



v;5, vo,r) - 



S 



l + ^(v-vo)2 



(6) 



where S is the mode height and Y is the mode linewidth. F is re- 
lated to the mode lifetime, r, through F - (ttt) In the case 
of solar-type stars and for low angular degree i, we can as- 
sume that F is a function of frequency alone, which is supported 
both by observat i ons of the Sun and by theoretical models (e.g. 
lAerts et al.ll2010l iDupret et al.ll2009l) . 

A power spectrum of solar-like oscillations will, of course, 
contain a myriad of modes spanning a broad range in frequency, 
superimposed on a background signal of both stellar and in- 
strumental origin. The overall limit spectrum is then given by 
the sum of the separate limit spectra arising from the differ- 
ent sources, since interference effects from beating between the 
modes average out in the limit. Notice that we are assuming that 
a mode is uncorrected with any other modes or with the back- 
ground signal. In doing so, we neglect any eventual asymmetries 
of the Lorentzian profiles (iDuvall et al.lll993l : lAbrams & Kumar! 

Nevertheless, when dealing with long time series, such 
asymmetries should be included in order to avoid biases in mode 
frequency determination. Furthermore, the presence of gaps and 
the finite length of the time series lead to a degradation of the ob- 
served power spectrum, which then results from the convolution 
of the true spectrum (i.e., the one that would be obtained were 
there no gaps) with the power spectrum of the window function 
(i.e., the spectral window). However, this problem is overcome 
by convolving the final limit spectrum with the spectral window. 

Ignoring any departure from spherical symmetry, non-radial 
modes differing only on the azimuthal number m are degener- 
ate and their profiles will be combined into a single profile, that 
of the (n, {) multiplet. Stellar rotation removes the {2{ + l)-fold 
degeneracy of the frequency of oscillation of non-radial modes, 
thus allowing for a direct measurement of the angular velocity of 
the star averaged over the region probed by these modes. When 
the angular velocity of the star, Q, is small and in the case of 
rigid-body rot ation, the freq uency of a (n, I, m) mode is given to 
first order by (lLedouxll95 



Q 



v„em = v„f + m—(l - C„f) . 
2n 



(7) 



The kinematic splitting, mQ./{2n), is corrected for the effect of 
the Coriolis force through the dimensionless quantity C„( > 0. 
In the asymptotic regime, i.e., for high-order, low-degree p 
modes, rotational splitting is dominated by advection and the 
splitting between adjacent modes within a multiplet is Vj - 
Q.l{2n). Second-order rotational effects are related to the distor- 
tion of the equilibrium structure of the star caused by centrifugal 
forces. Although negligible in the Sun, these effects are signifi- 
cant for faster solar-type rotators where these effects can cause 
non-n egligible biases on frequency determinations (e.g.. lBalloi 
12010 '). Large-scale magnetic fields may also introduce further 
corrections to the oscillation frequencies. 

Assuming energy equipartition between the components of a 
multiplet, we define the following symmetric profile for a(n,() 




960 980 
Frequency (|iHz) 

Fig. 1: Artificial power density spectrum of a ^ = singlet and a ^ = 1 
multiplet. One has = F = 3 /iHz. Notice how the £ = I multiplet splits 
into its m components. The power spectrum (grey) is distributed around 
a mean spectrum (black) with an exponential probability distribution. 



multiplet: 



m=-C 

(8) 

where Stmii) represents mode visibility within a multiplet and / is 
the inclination angle between the direction of the stellar rotation 
axis and the line of sight. The overall profile of a multiplet thus 
consists of the sum of 2/'h- 1 Lorentzian profiles regularly spaced 
in frequency and scaled in height accor ding to the ^fm(0 factor s 
(see Fig.[T]), which in turn are given by dGizon & Solankilllool : 



(e-\m\)\ 

(€ + \m\)\ 



[pl'"'(cos/)]^ 



(9) 



where P'^(x) are the associated Legendre functions. Notice that 
E/ii '^imii) - 1, meaning that S'[„{i) represents the relative power 
contained in a mode within a multiplet. 

Since we are primarily interested in performing a so-called 
global fit (e.g.. .Appourchauxet al. 2008) to the observed power 
spectrum, whereby several radial orders are fitted simultane- 
ously within a broad frequency range, we end up modelling the 
mean acoustic spectrum according to the following general rela- 
tion: 



''ma.i; '■max 



/7=«o f=o m=-e ^ " 



nt 



rnvs)' 



+ N{v), (10) 



where we have also included a profile describing the background 
signal, A^(v). Granulation, faculae and active regions might con- 
tribute to the stellar background signal, which is commonly 
modelle d as a sum of power laws describin g these physical phe- 
nomena (lHarvevlll98"5l:lAigrain etani2004 : 



N(v) 



2- 



(27rB,v)C* 



(11) 



{Alt} and {8^} being, respectively, the corresponding amplitudes 
and characteristic time-scales, whereas the {Q) are the slopes 
of each of the individual power laws. A flat component, A^, is 
needed in order to model the photon shot noise. Equation (fTTI) 
might just well incorporate any instrumental background signal. 
We refer to S„(/N(v„() as the signal-to-noise ratio (in power) of 
the multiplet {n,{). 
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Once again assuming energy equipartition between the dif- 
ferent components of a multiplet, their heights can be expressed 
as: 

(12) 



S nCm - S nl - (S'tmH) Vf a„( . 



The quantity is an estimate of the geometrical visibility of the 
total power in a multiplet (n, {) as a function of i, whereas a„( 
depends mai nly on the frequency and excita tion mechanism, i.e., 
a„c ^ a(v„i). Christensen-Dalsgaard ('2003') concisely treats this 
issue of spatial filtering. Equation (fT2l l. however, is only strictly 
valid under one assumption: When the stellar flux is integrated 
over the full apparent disc, one must assume that the weighting 
function, W, which gives the contribution of a surface element 
to the integral, is a function of the distance to the disc centre 
alone, i.e., W = W{d'), where ff is defined in an inertial frame 
with polar axis pointing toward the observer. In this case, the 
apparent mode amplitude can effectively be separated into two 
factors: (of,„(0 and V^. This assumption holds very well in the 
case of intensity measurements, since the weighting function is 
then mainly linked to the limb-darkening, whereas for velocity 
measurements departures might be observed due to asymmetries 
in the velocity field induced by rotation (see ' Ballot et al.ll2006i 
I2OO8I and references therein). See Appendix lAlfor how to com- 
pute S'e,„ii) and Vc. 

The heights of non-radial modes are commonly defined 
based on the heights of radial modes according to Eq. (fT2l i. and 
taking into account the V(/Vq ratios. Note that £ = modes con- 
stitute a sensible reference since they are not split by rotation. 
Table [T] displays the rel ative spatial response functions, Ve/Vo, 
computed according to iBedding et al.l (Il996l) . for a number of 
present and upcoming instruments/missions used when mea- 
suring solar-like oscillations. Those performing intensity mea- 
surements are the red channel of the V IRGO SPM i nstrument 
on board the SOHO spacecraft (iFrohlicheFall 1 1 995b . as well 
as the CoRoT and Kepler space missions. On the other hand, 
velocit y measurements ar e performed by the HARPS spectro- 
graph ("Mavor et al."2003) and are the purpose of the forthcom- 
ing SONG network (.Grundahl et al. . 20071 



Table 1: Relative spatial response functions, Vf/Vo, for a number of 
present and upcoming instruments/missions. Notice the increased sen- 
sitivity to ^ = 3, 4 modes in velocity. Negative values of Vc mean that the 
oscillations will appear to have reversed phases. 



Intensity 



Velocity 





VIRGO 


CoRoT 


Kepler 


HARPS 


SONG 




(862 nm) 


(660 nm) 


(641 nm)" 


(535 nm) 


(550 nm; 




1.00 


1.00 


1.00 


1.00 


1.00 


VilVo 


1.20 


1.22 


1.22 


1.35 


1.35 


V2/V0 


0.67 


0.70 


0.71 


1.02 


1.01 


V3/V0 


0.10 


0.14 


0.14 


0.48 


0.47 


V4/V0 


-0.10 


-0.09 


-0.08 


0.09 


0.09 



fix some of the parameters in order to reduce the dimension of 
parameter space. 

3. Bayesian inference 

Having set up the model of the power spectrum, we will now 
introduce the Bayesian statistical framework to be used for es- 
timating the model parameters and for comparing competing 
models. Let us start by considering a set of competing hypothe- 
ses, {//,), not necessarily mutually exclusive. We should then be 
able to assign a probability, p(Hi\D, /), to each hypothesis, taking 
into account the observed data, D, and available prior informa- 
tion, /, arising from theoretical considerations and/or previous 
observations. This is done through Bayes' theorem: 

p(mi)p(D\Hi,I) 



& = {S„(,v„e,r„e,v^,i,Ak,Bi„Ck,N} . 



(13) 



We have described in detail how the modelling of a power 
spectrum of solar-like oscillations can be achieved. When actu- 
ally fitting a model to an observed power spectrum, the set of 
parameters entering the model might differ from the one repre- 
sented in Eq. (fljT l. Moreover, it might be desirable to justifiably 



p{Hi\D,I) 



(14) 



p{D\I) 

The probability of the hypothesis //, in the absence of D is called 
the prior probability, p(Hi\I), whereas the probability includ- 
ing D is called the posterior probability, p{Hi\D, I). The quan- 
tity p(D\Hi,I) is called the likelihood of p{D\I) being the 
global likelihood for the entire class of hypotheses. Bayesian in- 
ference thus encodes our current state of knowledge into a poste- 
rior probability concerning each member of the hypothesis space 
of interest. Moreover, the sum of the posterior probabilities over 
the hypothesis space of interest is unity, and thus 



p(D|/) = ^p(//,|/)p(D|//,-,/). 



(15) 



3. 1 . Parameter estimation 

Very often a particular hypothesis, i.e., a model of the power 
spectrum, is assumed to be true and the hypothesis space of in- 
terest then relates to the values taken by the model parameters 0. 
These parameters are continuous, which means that the quantity 
of interest is a PDF. The global likelihood of model M, assumed 
true, is now given by the continuous counterpart of Eq. ( fTSl i: 



p{D\I)^ p(@\I)p(D\&,I)d&. 



(16) 



Let us restate Bayes' theorem in order to account for this 
new formalism: 



p{&\D,I) 



p{@\I)p(D\&,I) 



(17) 



" Calculated as the weighted mean over the spectral response function. 



Finally, a possible set of parameters going into the model is 
given by: 



p{D\I) 

where we have substituted the hypothesis, with the param- 
"eters of the model that is assumed true. The terms entering this 
equation have the same meaning as the corresponding terms en- 
tering Eq. (fl4l i. Use of Eq. (fTTT i allows one to obtain the full joint 
posterior PDF, p{&\D, I), this being the Bayesian solution to the 
problem of parameter estimation in contrast to traditional point 
"estimation methods (e.g. MLE). The procedure of marginalisa- 
tion makes it possible to derive the marginal posterior PDF for 
a subset of parameters ©a, by integrating out the remaining pa- 
rameters ©B, called nuisance parameters: 



10,/) = / 



p{&A,&s\D,I)d&s. 



(18) 



Furthermore, assuming that the prior on ©a is independent of 
the prior on the remaining parameters, then by applying the 
product rule we have: 



p(©a,©b|/) = pi@A\i)p(®B\eAJ) = p(&a\i)p{®b\i) . 



(19) 
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We will be working, in practice, with logarithmic probabili- 
ties. The global likelihood of the model plays the role of a nor- 
maUsation constant and we rewrite Eq. ( fTTT i as follows: 



In p(@\D, I) = const. + In p(&\I) + ^(0) . 



(20) 



3.2. Model comparison 



We might also be facing a situation wherein several parametrized 
models are available to describe the same physical phenomenon. 
We then expect Bayes' theorem to allow for a statistical compar- 
ison between these competing models. In fact, Bayesian model 
comparison has a built-in Occam 's razor, a principle also known 
as lex parsimoniae, by which a complex model is automatically 
penalised, unless the available data justifies its additional com- 
plexity. Notice that these might be intrinsically different models 
or similar models with varying number of parameters, or even 
the same model with different priors for its parameters. 

Given two or more competing models, {M,), and our prior 
information, /, being in the current context that one and only 
one of the models is true, we can assign individual probabilities 
similarly to what has been done in Eq. (fT4l l. after substituting //, 
with M, : 

MM,.|D,/)=/'^^'l^>^^^l^"^> 



p{D\I) 



(21) 



where p(D\Mi, I), also called the evidence of model M,, is given 
by Eq. (fTSI l. The problem of model comparison is therefore anal- 
ogous to the problem of parameter estimation as can be seen by 
comparing Eqs. ( fTTT i and ( 1211 1. 

Of particular interest to us will be calculating the ratio of the 
probabilities of two competing models. 



p(Mi\D,I) ^ pmr)p(D\Mi,I) ^ p(Mi\I) 
p(Mj\D,I) p(Mj\[)p(D\Mj,I) p{Mj\iy 



Bi 



(22) 



where Oij is the odds ratio in favour of model M, over model Mj, 
Bij is the so-called Bayes' factor and the remaining factor is the 
prior odds ratio. We will always assume that we have no prior 
information impelling us to prefer one model over the other, and 
hence p(Mi\I)/p(Mj\I) = 1. One is now naturally in need of a 
scale by which to judge the ratio of the evidences of two com- 
peting rnodels . The usual scale employed is the Jeffreys' scale 
(iJeffrevsll 19611) . which we display in Table|2]for convenience. 



Table 2: Jeffreys' scale. 



In Ojj Strength of Evidence 



< 1 Not worth more than a bare mention 

1-2.5 Significant 

2.5-5 Strong to very strong 

>5 Decisive 



3.3. Ignorance priors 

The main advantage of the Bayesian framework when compared 
to a frequentist approach is the ability to incorporate relevant 
prior information through Bayes' theorem and evaluate its effect 
on our conclusions. Assuming that the prior on each parameter is 
independent of the prior on any other parameter, then according 
to Eq. ( fT9l ) we have: 



M0|/) = ]~[/t(0,), 



(23) 



where fk(&k) is the prior PDF associated with the ^th parame- 
ter entering the model. As our state of knowledge of a particular 
physical phenomenon evolves through continued study and ex- 
perimentation, the set of priors relevant for the analysis of a new 
data set will change. In the early stages of research, however, 
we look for a set of priors that encode our rathe r limited state of 
knowledge, i.e., a set of ignorance priors (e.g.. lGregorvll2Q05aL 
and references therein). 

When dealing with location parameters, e.g. {v„f} in 
Eq. ( fni ). our choice of prior would at first be the uniform prior: 



fki®k) = 



b 



for ©mm 



<0* 
otherwise . 



- ^k ' 



(24) 



If we are ignorant about the limits ©™" and 0™", then we re- 
fer to fk(&k) as an improper prior, meaning that it is not nor- 
malised. An improper prior is not suitable for model comparison 
problems. On the other hand, when dealing with scale parame- 
ters, e.g. {S„(} in Eq. (fT3l l. our choice of prior might be that of a 
Jeffreys ' prior: 



f,(@,)^M^T 

b 



, for 0f " 



<&k< 0^" . 



, otherwise . 



(25) 



By employing a Jeffreys' prior we are assigning equal probabil- 
ity per decade (scale invariance), mainly useful when the prior 
range spans several orders of magnitude. In case the prior lower 
limit includes zero, a modified Jeffreys ' prior should be used in- 
stead to avoid the divergence at zero: 



i 7 x r/ ^ T , for < < 0"" . 

/i(0*) = U®'+®r)i4(®r+®r)/®r] 

, otherwise . 



(26) 



For 0t»0™", Eq. ( l26l l behaves just like a Jeffreys' prior, whereas 
for 0^ <K 0™' it behaves like a uniform prior, thus not diverging 
at zero. 0V" marks the transition between the two regimes. 



Furthermore, the Bayesian framework makes it possible to 
extract parameter constraints even in the presence of model 
uncertainty, i.e., when the implementation of model selection 
has not been successful. This is done by simply combining the 
probability distribution of the parameters within each individ- 
ual model, weighted by the model probability. This procedure, 
called Bayesian model averaging (see Liddle 2009. and refer- 
ences therein), is an analogue of the superposition of eigenstates 
of an observable in quantum mechanics. 



4. Markov chain Monte Carlo 

After inspection of Eq. ( fTSl ), the need for a mathematical tool 
that is able to efficiently evaluate the multi-dimensional integrals 
required in the computation of the marginal posteriors becomes 
clear. This constitutes the rationale behind the method known as 
Markov chain Monte Carlo, first introduced in the early 1950s 
by statistical physicists and nowadays widely used in all areas of 
science and economics. 
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4.1. Metropolis-Hastings algorittim 

The aim is to draw samples from the target distribution, 
p{&\D, I), by constructing a pseudo-random walk in model pa- 
rameter space such that the number of samples drawn from a 
particular region is proportional to its posterior density. Such a 
pseudo-random walk is achieved by generating a Markov chain, 
whereby a new sample, X,+i, depends on the previous sample^, 
Xf, in accordance with a time-independent quantity called the 
transition kernel, p{Xi+i\X,). After a burn-in phase, p(Xi+[\Xi) is 
able to generate samples of with a probability density con- 
verging on the target distribution. The Markov chain must fulfil 
three requirements in order to achieve this conv ergence: it mu st 
be irreducible, aperiodic and positive recurrent (lRobertslll996) . 

The algorithm that we employ in order to genera te a M arkov 
chain was initially proposed by Metropolis et al. (1953), and 
subsequently generalised bv .Hastings, (.197 0). this latter version 
being commonly referred to as the Metropolis-Hastings algo- 
rithm. It works in the following way: Suppose the current sam- 
ple, at some instant denoted by t, is represented by X,. We would 
like to steer the Markov chain toward the next sampling state, 
Xi+i, by first proposing a new sample to be drawn, Y, from a 
proposal distribution, q(Y\Xt), centred on X,. Here we specif- 
ically treat q(Y\Xf) as being a multivariate normal distribution 
with covariance matrix Z. We employ independent Gaussian pa- 
rameter proposal distributions and thus Z is assumed diagonal. 
The proposed sample is then accepted with a probability given 
by: 



a(Xt, Y) = min(l, r) = min 



piY\D,I) q{X,\Y) 



' p{X,\D,I) q{Y\X,) 



(27) 



where a{X,, Y) is the acceptance probability and r is called the 
Metropolis ratio. In the present case of a symmetric proposal 
distribution, we have q{X,\Y) - q{Y\Xi). As a result, if the poste- 
rior density for the proposed sample is greater than or equal to 
that of the current sample, i.e., p{Y\D, I) > p{Xt\D, I), then the 
proposal will always be accepted, otherwise it will be accepted 
with a probability given by the ratio of the posterior densities. If 
Y is not accepted, then the chain will keep the current sampling 
state, i.e., Xr+i=Xt. The procedure just described is repeated for 
a predefined number of iterations or, alternatively, for a num- 
ber of iterations dete rmined by a convergenc e test applied to the 
Markov chain (e.g., iGelman & Rubinlll992l) . The total number 
of iterations is denoted by «it. 

Once a Markov chain has been created, the problem of 
marginalization becomes trivial, as the way to extract informa- 
tion on the individual parameters is simply to generate a his- 
togram for each parameter and thus obtain its PDF. An appropri- 
ate number of bins in the histograms can be selected using for 
example Scott's criterion (IScottlll979h . Usually the information 
in the PDF will be condensed using some summary statistics, 
like for example finding the median of the distribution and the 
68% credible region around it. 



^ A remark on the notation: X, may be thought of as a single vector 
in parameter space. 
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Fig. 2: Schematic representation of the functioning of the parallel tem- 
pering mechanism, whereby tempering chains are allowed to swap their 
parameter states (swaps are indicated by vertical arrows). 



4.2. Parallel tempering 

The Metropolis-Hastings algorithm outlined above might be- 
come stuck in a local maximum of the target distribution, thus 
failing to fully explore all regions in parameter space containing 
significant probability. A wa y of overcoming this is to employ 
parallel tempering (e.g.. lEarl & DeemH2005i) . whereby a discrete 
set of progressively flatter versions of the target distribution are 
created by introducing a temperature parameter, In practice, 
use is made of its reciprocal, /3-l/^, referred to as the temper- 
ing parameter. By modifying Eq. (fTTl l. we generate the tempered 
distributions as follows: 



p(®\D,/3, 1) oc p(@\I) p(D\@, if, < yS < 1 



(28) 



For yS= 1, we recover the target distribution, also called the cold 
sampler, whereas for fi< \ the hotter distributions are effectively 
flatter versions of the target distribution. Drawing samples from 
a hotter, i.e., flatter, version of the target distribution will allow, 
in principle, to visit regions of parameter space containing sig- 
nificant probability, otherwise not accessible to the basic algo- 
rithm. The problem of parameter estimation obviously contin- 
ues to rely on samples drawn from the cold sampler. In Sect. 14.41 
we describe how samples drawn from the remaining tempered 
distributions are useful in evaluating Bayes' factor. 

Implementation of parallel tempering works in the following 
way: Several versions of the Metropolis-Hastings algorithm are 
launched in parallel (np in total), each being characterised by a 
different tempering parameter, j6, . At random intervals, compre- 
hending a mean number (wswap) of iterations, a pair of adjacent 
chains, labelled with /?,■ and /?,+!, is randomly chosen and a pro- 
posal is made to swap their parameter states. The proposed swap 
is then accepted with a probability given by: 



^swap — min(l, ^swap) 

piXu^i\D,f5i,I)p(X,j\D,f5M,I) 



min 



1, 



p(X,j\D,/3i,I)p(X,j^i\D,/3M,I) 



(29) 



where, at instant t, chain is in state X,j and chain j6,+i is in 
state Xij+i. By running such a set of cooperative chains, we ef- 
fectively enable the algorithm to sample the target distribution 
in a way that allows for both the investigation of its overall fea- 
tures (low-j6 chains) and the examination of the fine details of a 
local maximum (high-/? chains). A schematic representation of 
the functioning of the parallel tempering mechanism is shown in 
Fig.|2] In Fig. [3] a version of the Metropolis-Hastings algorithm 
is shown, written in pseudocode, and with the inclusion of the 
parallel tempering mechanism. 

Concerning the values taken by the tempering parameter, 
{J3i}, optimal values are chosen in order to achieve a swap ac- 
ceptance rate between adjacent levels of ~ 50%. Heuristically, 
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1 : procedure Parallel Tempering Metropolis-Hastings 

2: Xo,, =Xo, 1 < (■ < n/j 

3: forf = 0, l,...,ni, -Ido 

4: for i = 1, 2, ... do 

5: Propose a new sample to be drawn from a 

proposal distribution: Y ~ N(X, j; Z,) 

6: Compute the Metropolis ratio: 

In r = In p(Y\D,/3i, I) - In p{X,j\D,j3i, I) 

7: Sample a uniform random variable: 

U, ~ Uniform(0, 1) 

8: if In f/i < In r then 

9: = Y 

10: else 

12: end if 

13: end for 

14: Ui ~ Uniform(0, 1) 

15: if 6^2 < l/^swap tlien 

16: Select random chain: 

/ ~ UniformInt(l, - 1) 
17: Compute rs„ap: 

Inr^wap = lnp(X„+i\D,p,J) + \np(X„\D,fiM,r) 
- In p{X„\D,p„ I) - In p{X„^i\D,pM, I) 
18: Ui~ Uniform(0, 1) 

19: if In 1/3 < In r^^.^^ then 

20: Swap parameter states of chains / and i -1- 1: 

21: end if 

22: end if 

23: end for 

24: return ;C,_, , Vf, (:y8, = l 

25: end procedure 



we can assert that by employing a geometric progression (cf. 
iBenomar et"ani2009at) . 

Pi = /l'-' , (30) 

such a desideratum is reached by setting A~\.2. The number of 
chains, n^, should be chosen such as to reach a desired balance 
between sampling efficiency and computational time. However, 
when using the parallel tempering mechanism in model compar- 
ison problems, as we will get back to in Sect. 14.41 a large number 
of tempering chains are needed (typically iij} > 10). The value of 
"swap should be chosen inversely proportional to np (typically a 
few dozens). 

4.3. Automated MCMC 

So far we have not mentioned the need to adequately choose 
the set [cr] of diagonal elements of the Z matrix, indicating the 
width of the Gaussian proposal distribution for each parame- 
ter. The set of individual cr values specifies the direction and 
step size in parameter space when proposing a new sample to 
be drawn. The optimal choice of [cr] is closely related to the 
average rate at which proposed state changes are accepted, the 
so-called acceptance rate. Accordingly, small cr values will lead 
to a large acceptance rate, with successive samples being highly 
correlated and ultimately requiring a large number of iterations 
in order to yield equilibrium distributions of model parameters. 
On the other hand, large cr values will lead to a low acceptance 
rate, meaning that proposed state changes will seldom be ac- 
cepted. This is illustrated in Fig. [H where the same simplified 
target distribution is sampled by three chains, each being char- 
acterised by a set of cr values differing on the respective mag- 
nitudes. Roberts et al. (1997) recommend, based on empirical 
studies, calibrating the acceptance rate to ~ 25% when dealing 
with a high-dimensional model as it is the case when performing 
a global peak-bagging. 

One could, of course, employ a trial-and-error approach and 
manually calibrate the cr values. However, since we are dealing 
with a large number of parameters that, in addition, correspond 
to several different physical quantities, this would quickly be- 
come very time-consuming and impractical. We instead employ 
an automated process of calibration of the proposal cr values, 
which is bas ed on a s t atistica l control system similar to the one 
described in lGregorvl ( l2005bh . The control system makes use of 
an error signal to steer the selection of the cr values during the 
burn-in stage of a single parallel tempering MCMC run, acting 
independently on each of the tempered chains. The eiTor signal 
is proportional to the difference between the current acceptance 
rate and the target acceptance rate. As soon as the eiTor signal 
for each of the tempered chains is less than a measure of the 
Poisson fluctuation expected for a zero mean error(computed as 
the square root of the target acceptance rate times the number of 
iterations between changes in the cr values), the control system 
is turned off and the algorithm switches to the standard parallel 
tempering MCMC. In practice this effectively marks the end of 
the burn-in stage. 

The control sy s tem a s briefly d escribed here is also us ed in 
iGruberbauer et alJ (l2009l) . whereas 'Beno mar et al.l (l2009ah em- 
ploy a self-learning process that appropriately adapts the covari- 
ance matrix, assumed non-diagonal. 



Fig. 3: Version of the Metropolis-Hastings algorithm written in pseu- 
docode and with the inclusion of parallel tempering. 
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Fig. 4: The same target distribution sampled by three chains, each be- 
ing characterised by a different set jcrj. The contours map the target 
distribution, which in turn depends only on the parameters 0i and 02. 
The starting point in each of the chains is (0i,02) = (-4.5,-4.5) and 
all contain 2000 iterations. Both parameters share a common cr-value 
whose optimal setting is tr = 1 . It is important to note that given a suffi- 
ciently large number of iterations, all the chains would eventually map 
out the target distribution, however an optimal choice of the proposal 
distribution will result in significantly faster convergence. 

4.4. Model comparison using parallel tempering MCMC 

We are now interested in computing the odds ratio, Otj, in 
favour of model M, over model Mj according to Eq. ( l22l i. When 
analysing solar-like oscillations, a recurrent difficulty is to cor- 
rectly tag the modes of oscillation by angular degree £. There 
are two possible ways of tagging the modes or, equivalently, two 
competing models. Computation of Oij is thus a means of assess- 
ing which of the two identification scenarios is statistically more 
likely (although not necessarily physically more meaningful, as 
is often misinterpreted). 

Samples drawn from the tempered distributions can, in prin- 
ciple, be used to compute the global likelihood, p(D\Mi,I), of a 
given model Mj. Notice that Bayes' factor, B/j, is defined as the 
ratio of the global likelihoods of two competing models: 

p{D\Mi, I) ^ i^j^ p(D\Mi, I) - In p(D\Mj, I)] . (31) 



Bij = 



p{D\Mj,I) 



It can be shown that the global likeli hood of a model is given by 
(for a derivation see lGregorvll2005bl) : 

lnp(Z)|M„/)= r {\np{D\Mi,X,r))pdl3, (32) 
Jo 

(In p{D\Mi, X, = ^ 2 ^" ^^-^l^" ^'-8' ^^^^ 



where 



is the expectation value of the natural logarithm of the likeli- 
hood for a particular tempered chain characterised by (3. The set 
{Xtfi] represents the corresponding samples drawn after the burn- 
in stage, while n is the number of samples in each set. A suffi- 
cient number (>10) of parallel tempered chains is required if we 
are to estimate the integral in Eq. (l32T i by interpolating values of 
{\np(D\Mi,X,r))p. 



5. Examples 

In the following we will pick a couple of examples where 
we have applied the described Automated Parallel Tempering 
MCMC formalism to recent measurements of solar-like oscilla- 
tors. 



5.1. HD 49933: The importance of priors 

We have performed an analysis of the star HD 49933, based on 
180 days of photometry from the CoRoT satellite arising from 
two runs: The initial 60-day run, IRaOl, and 120 days from the 
longer second run, LRaOl. The time series was split up into seg- 
ments of 30 days and the power spectra of the individual seg- 
ments were averaged to construct a mean power spectrum {s-6 
in Eq.|2l). The acoustic spectrum of this F5 main-sequence star 
has proven to be very difficult to interpret mainly due to the rel- 
atively large linewidths (see Fig.|5]l. We assume the ridge identi- 
fication denoted as "Scenario B" in lBenomar et al.l ( l2009bl) . 

The acoustic spectrum was fitted using the APT MCMC for- 
malism, but using two different sets of priors (see Table |3]l. The 
first set (SI) was constructed using only ignorance priors, while 
the second set (S2) includes knowledge about the stellar rota- 
tion. From spe ctroscopic and asteroseismic studies of HD 49933, 
iBruntll (120091) was able to constrain the rotation of the star to 
vsin(0 = 10+ Ikms^' and the radius to /^/Rq = 1.385±0.031, 
which can be combined to impose a constraint on the projected 
rotational splitting, v* = Vs sin(/), of 1.65 ± 0.17 /iHz. In set 
S2 this knowledge is added as a gaussian prior on the projected 
splitting of the star In both cases the fits were done using the 
following configuration: 

- 15 orders were fitted with { = 0, 1, 2 modes in a fitting win- 
dow spanning from 1220 to 2465 /iHz (see Fig. |5]l. 

- One linewidth and one height per order assigned to the £-Q 
mode, and then linearly interpolated by frequency and scaled 
to the higher degree modes. 

- Rotation and inclination angle fitted with the two free param- 
eters, V* and /. 

- The background was parametrized as a sum of 3 Harvey-like 
models plus a white noise contribution. 

- 800000 samples were drawn from the target distribution, 
employing 10 parallel chains. 

First of all, it is important to note that the results are con- 
sistent with the ones reported in' Benomar et al.l ( 12009 b'). For ex- 
ample the derived frequencies and linewidths are all well within 
the error bars. We will here focus on the results of the rotational 
splitting and inclination angle. The probability density functions 
for the fitted parameters when using ignorance priors (SI) are 
shown in Fig. |6a]and, after applying the Gaussian prior on the 
projected splitting (S2), the results change to the ones shown in 
Fig.Hb] 

Table 3: Prior input for the HD 49933 analysis. 



Parameter 


Prior 


Frequencies 


Uniform 


Heights 


Modified Jeffreys 


Linewidths 


Uniform 


Inclination 


Uniform (0°-90°) 


Rotation 


SI: Uniform on (0-lOyuHz) 




S2: Gaussian on v* (1.65 ± 0. 17yuHz) 
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(a) (b) 

Fig. 6: Results on the rotation and inclination angle of HD 49933. In 
both cases the prior on the inclination angle is uniform in the interval 
0°-90°. [(a)l We employ a uniform prior on rotation (Sl). |(b)| We employ 
a Gaussian prior on projected rotation (S2). The vertical lines in the 
histograms indicate the median and the boundaries of the 68% credible 
regions of the distributions. The dashed line in the top figures indicates 
the frequency resolution in the spectrum. 



What these results demonstrate is, first of all, that these tech- 
niques are extremely efficient at probing and constraining pa- 
rameters which traditional methods vyould have con siderable dif- 
ficulty in constraining. In iBenomar et alj (l2009bl) . where simi- 
lar techniques are employed, the derived inclination angle was 
n°tlo and the rotational splitting placed in the range 3.5- 
6.0;uHz, with which our results are perfectly consistent with. 
Another point to be drawn from this example is the importance 



of the inclusion of prior knowledge. By incorporating our prior 
knowledge about the rotation of the star through the accurate 
measurements from a spectral analysis, we are able to yield 
a much cleaner constraint on the inclination angle, which in 
Fig. |6a]has a considerable tail of probability towards large in- 
clinations. 

It is important to note that this prior on v* is strong in the 
sense that it dominates the fit. This simply comes from the fact 
that the data do not provide any further information on this pa- 
rameter and so our prior knowledge of the model is still pro- 
viding the best constraint. In such cases one of course has to be 
careful that such a strong prior is not wrongfully restrictive. If 
other effects (in this case, for instance, differential rotation or 
second-order rotational effects) were present, this could intro- 
duce biases to the fitted results. This could of course be tested 
using the methodology described in Sects. l3T2l and 14.41 by con- 
structing models that incorporate these effects and testing their 
significance. 

5.2. Procyon: The problem of ridge identification 

Here we address the issue of tagging the oscillation modes by 
angular degree in the case of the F5 star Procyon. We have 
thus reanalysed the data acquired during a multi-site campaign 
dArentoft et al.ll2008t [Bedding et al.1 1201 Ob i) carried to observe 
oscillations in this star. The data consist of high-precision ve- 
locity observations obtained over more than three weeks with 
eleven telescopes, representing the most extensive campaign or- 
ganised so far on a solar-type oscillator. 

The problem of ridge identification in F stars dates back 
to when CoRoT observati ons of HD 49933 were first analysed 
(lAppourchaux et aT] l2008h . a problem that would be recently 
solved for this star only after a new l onger time se r ies was 
made available ( iBenomar et al.ll2009bl) . iBedding et all ( 1201 Obi) 
address this same problematic in the case of Procyon by em- 
ploying three distinct methodologies: (i) a collapsed power spec- 
trum along several radial o rders, (ii) a scaled echelle diagram 
(iBedding & Kieldsenl l2010') and (iii) Bayesian model compari- 
son (as described in Sect. l4.4l i. The last-mentioned methodology 
statistically favours their Scenario A over their Scenario B iden- 
tification, whereas the first and second methodologies suggest 
the contrary although without quantifying their preference for 
Scenario B in a statistical sense. 

We performed a peak-bagging of the power spectrum of 
Procyon considering both identification scenarios while simul- 
taneously testing for the presence of ^ = 3 modes. This gives 
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a total of four competing models, i.e., {Ma,M^^^,Ms,M^^}, 
the notation chosen to be unambiguous. The details of the peak- 
ba gging as implemented h ere slightly differ from those presented 
in Bedding et al.l (1201 Obi) , and especially concern the limits of 



the fitting window and the way in which the background was 
parametrized. The details are as follows: 

- The peak-bagging was performed on the sidelobe-optimised 
power density spectrum whose intrinsic frequency resolution 
is 0.77 fiHz. Peaks were described by symmetric Lorentzians 
centred on the mode frequencies. Three frequencies were 
fitted per overtone, each with a different angular degree 
({ = 0, 1,2). Type of prior imposed at first: independent and 
uniform, centred (±8 yuHz) on the initial guesses. The mode 
frequencies were further constrained to lie close to the ridge 
centroids and to have only small jumps from one order to 
the next. Also, a Gaussian prior (fi-A- fjHz, cr-5 //Hz) was 
imposed on the small frequency separation, <5vo2, between 
adjacent modes with { - Q and { = 2. The small separation 
was not itself a free parameter in the fit, but instead a derived 
quantity. Note that the last-mentioned constraint implies that 
the type of prior on the ^ = 0,2 frequency parameters is ulti- 
mately not independent nor uniform. Optionally, modes with 
{ = 3 could be included in the model with their frequencies 
fixed to 

v„-i,3 = v„,i - |(v„.o - v„_i,2) , (34) 

according to the asymptotic relation ( lTassoul|[l980h . A total 
of 14 overtones were considered and the fitting window runs 
from 500 to 1300yL(Hz. By employing this construction it is 
assumed that no mixed modes are present in the fitting win- 
dow. The inclusion of /' = 3 modes does not add any more 
free parameters, while adding however their features to the 
model spectrum which are derived from the fitted { = 0,1,2 
frequencies. 

- The linewidth was parametrized as a linear function of fre- 
quency, defined by two parameters Feoo and Fnoo, which are 
the values at 600 and 1200 /iHz, respectively. Both parame- 
ters were fitted. Type of prior imposed: uniform in the range 
0-lOyuHz. 

- The heights of ra dial modes in units of power density were 
fixed according to lChaplin et al ] (l2008bh : 



S nO - 



2A^T 

7!-rr„o + 2 ' 



(35) 



where is the total power of the mode as determined from 
the power envelope for radial modes (Kieldsen et al. 2008), 
and T is the effective length of the observational run. The 
heights of non-radial modes were then defined based on the 
heights of radial modes according to Eq. (fT2l l. and taking 
into account the appropriate Vf/Vo ratios given in table 1 of 
[Kieldsen et al. (2008). 

The background was parametrized as a linear function of fre- 
quency since it had previously been suppressed at low fre- 
quencies (high-pass cut at 280 //Hz) to effectively remove 
slow variations. 

The inclination angle between the direction of the stel- 
lar rotation axis and the line of sight was fixed at 31.1 °, 
which is the inclination of the binary orbit and is consis- 
tent with the rotational modulation of the velocity curve. 
The rotational splitting was fixed at 0.7 /iHz, which was cho- 
sen to m atch the observed value of vsin(0 = 3.16 km s"' 
dAllende P rieto et al. 200^ g iven th e known radius of the 
star of 2.05 R© (.Kervellaet al.ll2004 . 



- We drew ~ 800 000 samples from the target distribution after 
a burn-in phase. We employed 12 parallel tempered chains. 

- We thus have a total of 46 free parameters, namely, 42 fre- 
quencies, 2 parameters for the linewidth and 2 parameters 
for the background. 

Table|4]summarises the model selection calculations assum- 
ing equal prior probabilities for the models belonging to our dis- 
crete model space. Individual probabilit ies are assigned to mod - 
els according to Eq. (jJl)- Similarly to 'B edding etal] (1201 Obh . 
Bayesian model comparison again statistically favours Scenario 
A over Scenario B. Furthermore, the presence of residual power 
due to ^ = 3 modes is suggested. Computing Bayes' factor in 
favour of model M^"^ over model gives a factor of approx- 
imately 9:1 or, equivalently, a logarithmic factor of 2.2, which 
classifies as 'significant' on Jeffreys' scale. Figure|2]displays the 
power density spectrum of Procyon in echelle format with the 
fitted frequencies for model M'f^ overlaid. 



Table 4: Model probabilities. 



Model 


In p(D\Model, I) 


Probability 


Ma 


2789.723 


39.25% 




2790.046 


54.23% 


Mb 


2785.806 


0.78% 




2787.801 


5.74% 
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a. 
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I'requency modulo 56.0 /xHz 



Fig. 7: Power density spectrum of Procyon (smoothed to 2/jHz) in 
echelle format. The fitted frequencies for model M'^^ appear as over- 
laid filled symbols. Symbol shapes indicate mode degree: ( = (circles), 
f = 1 (triangles), £ = 2 (squares). 
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6. Summary and discussion 

In this paper, we have presented the basic theory and methods 
behind the extraction of parameters from the power spectra of 
solar- like stars. In order to handle the ever rising quality and 
complexity of modern asteroseismic data, we have developed 
a tool (APT MCMC) that enables us to constrain parameters 
associated with the subtlest features in the spectra. The algo- 
rithm has been extensively tested and performs extremely well, 
not only in the traditional case of extracting oscillation frequen- 
cies, but also when pushing the limit where traditional methods 
have difficulties, such as constraining linewidths, rotational split- 
tings and stellar inclination angles. In this work we have focused 
on data in the signal-to-noise regime of cuiTent asteroseismic 
measurements. In the case of very high signal-to-noise ratios, 
other features in the power spectrum becomes important, such as 
mode asymmetries and rotational splittings dependent on {, aris- 
ing from differential rotation with radius. In future work these 
effects will be incorporated into the program and tested on solar 
data. 

One disadvantage of the method is that it can be quite com- 
putationally intensive, both to implement and run, when com- 
pared to traditional MLE fits. This is however balanced by the 
much added information outputted from the fits, specifically in 
the probability distributions of each parameter, making it easy 
to obtain accurate, reliable and realistic error bars on the results 
- a feature seriously missing from the traditional methods. The 
parameter estimation also benefits enormously from the possibil- 
ities the Bayesian formalism provides with inclusion of prior in- 
formation. This not only allows control of the fit to, for example, 
not allow unphysical parameter combinations, but also include 
information into the fit that is better constrained by other mea- 
surements (as we saw in Sect. 15.1b . Another powerful feature of 
the method lies in the parallel tempering, which not only keeps 
the fits from getting stuck in local maxima, but also provides 
an objective way of comparing different competing models, as it 
provides a way of calculating the global likelihood. This can for 
example be utilized in the familiar problem of ridge identifica- 
tion in solar-like stars (see Sect. 15.2b . 

A thing to keep in mind is also that the APT MCMC al- 
gorithm is completely general, in the sense that it could be ap- 
plied to other problems without modification. MCMC methods 
are being use d in various branches of astrophysics: cosmology 
(lLiddlell2009h . extra solar planets (" Gregory! l2005al) and stellar 
model fitting (iBazot et al.l l2008). but in fact the methods would 
be applicable in any problem including parameter estimation. 
And as computational power continues to grow, the downsides 
are quickly becoming insignificant. 

What could to some extent also be seen as a disadvantage of 
these methods is that they can never be fully automated, in the 
sense that they will not be able to handle a large number of stars 
without human interaction. The whole fundamental idea behind 
the Bayesian formalism is that it relies on "wise" human inputs 
on the priors and model setup that should not be done in an au- 
tomated way. If nothing else, take this as a positive reassurance: 
You will, as an astrophysicist, never be obsolete to computers or 
monkeys with keyboards. 
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the observational method. This matrix product can be used for 
velocity measurements by setting c = 1 and for intensity mea- 
surements by setting c = 0. 



Appendix A: Computing S[m{i) and V[ 

The ^cmii) factors are given below for ( e [0,4], having used 
Eq. @: 
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Notice that when the rotation axis is aligned with the line of sight 
(i = 0°), only the multiplet component with m = is visible, thus 
making inviable an inference of rotation. 

The spatial response function for each {, Ve, representing the 
ratio of the observed amplitude to the actual amplitude, is given 
here for the five lowest degree modes (.Bedding et al...l996.) : 
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where U2 and V2 ar e waveleng th-dependent classical limb- 
darkening coefficients (lAllenll973il and c is a parameter defining 
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